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ABSTRACT 


The  problem  of  characterizing  the  effects  that  uncertainties  and/or  small  changes  in 
the  parameters  of  a  model  can  have  on  optimal  policies  is  considered.  It  is  shown  that 
changes  in  the  optimal  policy  are  very  difficult  to  detect  even  for  relatively  simple  models. 
By  showing  for  a  machine  replacement  problem  modeled  by  a  partially  observed,  finite 
state  Markov  decision  process,  that  the  infinite  horizon,  optimal  discounted  cost  function 
is  piecewise  linear,  we  find  formulas  to  compute  the  optimal  cost  and  the  optimal  policy, 
thus  providing  a  means  for  carrying  out  sensitivity  analyses.  Examples  are  presented  to 
show  the  usefulness  of  the  results. 

Key  words:  Sensitivity  analysis,  Markov  Decision  Processes,  Dynamic  Programming. 

1.  INTRODUCTION 


The  problem  of  finding  explicit  descriptions  and/or  structural  properties  of  optimal 
control  laws  and  costs  for  partially  observed  (PO)  stochastic  control  problems  has  re¬ 
ceived  considerable  attention  in  recent  years  (e.g..  Ref.  1,  2).  This  is  due  in  part  to  the 
computational  advantages  that  have  resulted  from  such  descriptions  (e.g..  Ref.  3,  4,  5), 
and  in  part  to  the  increasing  interest  in  the  design  of  adaptive  control  techniques  aimed  at 
overcoming  changes  in  the  optimal  law  caused  by  uncertainties  and/or  small  changes  in 
the  parameters  of  the  physical  system  being  modeled  (e.g.,  Ref.  6,  7,  8). 

In  this  note  we  are  intersted  in  finding  how  uncertainties  and/or  small  changes  in  the 
parameters  of  the  model  affect  the  optimal  policy  and  cost  of  a  discrete  PO  Markov  deci¬ 
sion  process  (MDP).  A  simple  example  of  such  a  process  is  that  associated  with  a 
machine  that  produces  items,  namely  a  process  that  can  be  in  either  a  "good"  or  in  a  "bad" 
state  (corresponding  to  whether  the  machine  produces  or  needs  to  be  replaced).  Since  the 
state  of  the  machine  is  monitored  incompletely,  this  problem  is  converted  to  an 
equivalent  completely  observable  (CO)  MD  problem  (see  e.g.  Ref.  3,  or  Ref.  9,  chapter 
3),  in  which  the  condition:'!  probability  vector  n  (t)  =  (jti(t),  •  •  •  ,nN(t)),  with  «,  (/) 
the  probability  that  the  machine  is  in  state  i  (i  e  {1 ,  •  •  •  ,N  })  given  past  observations 
and  actions,  provides  all  the  relevant  information  to  select  the  control  action  at  time  t . 

Even  for  this  simple  model,  the  effect  that  uncertainties  on  the  parameters  have  on 
the  optimal  policy  and  cost  is  not  easy  to  determine.  The  complication  arises  due  to 
several  reasons,  including  the  following:  (i)  The  optimal  control  action  has  to  be  specified 
for  each  of  the  (uncountably  infinite  number  of)  values  of  n  ( t ) .  This  should  be  contrast¬ 
ed  with  the  case  of  perfect  observations  where  one  need  only  compute  the  control  for 
each  of  the  finite  number  of  values  of  the  states:  (ii)  The  control  process  for  this  produc¬ 
tion  problem  takes  values  in  a  finite  set  (namely  one  can  let  the  machine  produce,  and  ei¬ 
ther  inspect  or  replace  the  machine),  and  so  derivatives  with  respect  to  the  control  are 
not  defined;  (iii)  It  is  well  known  (Ref.  1-9)  that  the  optimal  cost  for  the  problem  con¬ 
sidered  here  satisfies  a  functional  equation,  which  can  be  solved  by  using  the  Dynamic 
Programming  (DP)  algorithm  (see  Ref.  9,  chapter  5).  Computationally  however,  this  is 
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not  an  easy  problem.  The  space  in  which  n  (/)  takes  its  values  is  discretized  by  means  of 
a  grid  which  has  to  be  changed  several  times  until  one  is  certain  that  the  result  of  (apply¬ 
ing)  the  algorithm  is  actually  a  solution  of  the  functional  equation,  and  is  not  just  an 
artificial  result  of  the  numerical  discretization  (see  the  examples  below). 

We  investigate  the  dependence  of  the  optimal  cost  and  the  optimal  policy  on  any  of 
the  parameters  of  the  model  with  two  states  before  considering  a  more  general  model. 
The  equations  involved  are  scalar  equations  since  n  (/)  can  be  written  as 
n  (0  =  ( 1  -  p(t)  ,p(t) ) ,  where  p(t)  is  the  probability  that  the  system  is  in  the  bad  state  at 
time  t  given  past  observations  and  actions;  p{t)  will  usually  be  denoted  by  p,  omitting 
explicit  dependence  on  t.  The  scalar  formulation  facilitates  the  determination  of  struc¬ 
tural  properties  of  the  cost  and  the  policy. 

2.  MODEL,  NOTATION  AND  REVIEW  OF  PREVIOUS  WORK 

We  denote  by  (x,  ,t  =  0, 1 ,  ■  •  •  )  the  finite  state  MP  associated  with  a  machine  that 
produces  items;  the  state  space  is  X  =  {0.1 ),  also  referred  to  henceforth  as  {good,  bad  ), 
respectively.  Denote  by  {u, ,  t  =  0, 1,  ••  •  )  the  control  process;  u,  e  U  =  {0 , 1 , 2 ) ,  also  re¬ 
ferred  to  as  {produce,  inspect,  replace),  respectively.  We  associate  a  cost  with  each  con¬ 
trol  action  as  follows:  the  coct  associated  with  replacing  the  machine  is  denoted  by  R  , 
and  the  cost  associated  with  inspection  by  /  .  The  cost  associated  with  production  is  0  if 
the  machine  is  in  the  good  state,  and  C  if  the  machine  is  in  the  bad  state.  We  assume 
that  0  <  C  <  I  <  R  .  ' 

At  time  t  one  must  decide  whether  to  inspect  the  item  produced  or  not,  and  whether 
to  replace  the  machine  or  not.  If  the  machine  is  replaced  it  will  be  in  the  good  state  at 
the  end  of  period  t .  It  is  further  assumed  that  no  item  was  produced  during  that  period. 
Inspection  might  be  carried  out  to  determine  the  state  of  the  machine,  but  it  will  not  im¬ 
ply  a  decision  about  whether  to  replace  the  machine.  The  observation  process 
(y, ,  t  =  0 , 1 ,  •  •  •  }  takes  values  in  Y  =  {0 , 1  ). 

Assume  that  the  probability  vector  p°  =  (p°  ,p ?)  is  given,  where  p,°=  Pr  {x0=  i }, 
i  =  0,l,  and  n  (0)  =  p°.  The  machine  evolves  according  to  transition  probabilities 
Pii  (  u, )  defined  by  pl}  (  v  )  =  Pr  {*I+1  =  j  /  x,  =  i ,  u,  =  v  ).  Let  P(u,) ,  u,  e  U  ,  be  the  transition 
probability  matrices  with  entries  piy  (u, ).  The  transition  matrices  P(u,),  u,tU  ,  are  given 
by: 


P(0)  =  P(1)  = 


1-0 

0 


0 

1 


P(2)  = 


1-0  0 
1-0  0 


f  =  0 , 1 , 


(1) 


where  0  is  the  probability  of  machine  failure  in  one  time  step. 

The  observation  process  is  related  to  the  state  and  the  control  processes  by  means  of 
the  conditional  probabilities  qA  (v)  =  P  {y,+  i  =  k  lx,  =  /,«,=  v  ),  with  ?*(«,)  the  entries  of 
the  observation  matrices  Q(u,),  u,tU  ,  given  by: 


C (0)  =  fi ( 1 )  =  2 ( 2)  = 


q 

l-q 


1  -q 
q 


1  =  0,1, 


(2) 


where  qe[ 0,5, 1.0)  is  the  probability  of  making  a  correct  observation.  The  model  is  the 
one  described  by  Ross  in  Ref.  10. 

We  are  interested  in  the  infinite  horizon  case,  and  the  objective  is  to  find  an  optimal 
admissible  control  policy  that  minimizes  the  expected  discounted  cost  J ,  (/>°) ,  given  by: 


Jt(p°)  =  £.li  P'c(xI.«,)]  (3) 

1-0 

where  £,o(.)  denotes  conditional  expectation  with  respect  to  p°;  (J  is  the  discount  factor 
with  0s  0<  1;  c(x,,ut)  is  the  cost  accrued  when  the  machine  is  in  state  x,  and  action 
u,  is  selected;  and  g  ~  U,),«i  is  an  admissible  policy,  that  is,  {g,  ),“i  is  a  sequence  of 


Borel  measurable  maps  *,:[0,1]  ->  U  such  that  u,  =  g,(p(t)) ,  u,tV  ,  for  *  =  0,1, 

If  no  observations  are  available,  u,  can  still  be  written  as  u,  =  g,(p(i)),  u,tU  ,  and 
gt  '  [0, 1  ]  -*  V  for  t  -  0, 1,  •  •  •  ,  where  now  p  (*)  is  the  (aposteriori)  probability  that  the 
machine  is  in  state  1,  This  is  because  the  expected  cost  can  be  expressed  explicitly  in 
terms  of  /?(*).  However,  in  this  case  lg, j  is  a  deterministic  sequence  since  p(t) 
depends  only  on  p°,  which  is  given,  and  is  updated  from  time  t  to  time  t+  l  using  the 
transition  probabilities  P  (.) ,  also  given.  If  g,(.)  =  g()  for  all  values  of  * ,  the  policy  is 
said  to  be  stationary  (when  computing  optimal  policies  in  the  infinite  horizon  case,  we 
need  only  consider  stationary  policies;  see  Ref.  9,  p.  225). 

Define  Vp(p)  =  infJt(p).  Then  Vp(p)  is  the  expected  cost  accrued  when  an  optimal 

policy  is  selected,  given  that  the  machine  starts  in  the  bad  state  with  probability  p,  and 
future  costs  are  discounted  at  rate  p.  It  is  well  known  (e.g.  Ref.  9,  10)  that  Vp(p)  is  the 
unique  solution  of: 

Vp(p)=min{C/>+P£  D  (k  ,p,0)  Vp(  T  (k  ,p,0)) , 

*«o 

/  +  p£D(*,p,l)Vp(7-(*,p,l)), 

t=o 

R  +  p£f>(*.p.2)V'p(r(*,p,2))}  (4) 

4*5  0 

where  T  (Jk.p.v)  is  the  updated  probability  that  the  system  is  in  state  1,  given  that  k 
was  observed  and  the  control  applied  was  v.  T(k,p,v )  is  given  by 
T  (k  ,p,v)  =  N  (k  ,p,v)/D(k  ,p,  v) ,  with  N  (k,p,v)  =  ( Af  i(A  ,p,  v )  ,  N  2(k  ,p,  v))  , 

D  (k ,p,v)  =  ^Nj(k ,p,v) ,  and  where  Nj(k,p,v)  represents  the  probability  that  the  next 

i 

state  is  j  given  that  the  outcome  is  k  and  the  control  applied  is  v  (see  Ref.  9,  10  for  de¬ 
tails). 

We  review  some  previous  work  associate^  with  the  following  two  special  cases  of  the 
strictly  PO  (i.e.,  partial  observations  during  production  and  during  inspection)  problem: 
Case  A:  Only  two  actions  are  considered,  namely  U  =  (0,2),  and  the  state  of  the  system 
is  not  observed  (i.e.,  the  state  is  ’completely  unobserved’)  during  production,  that  is,  all 
the  entries  in  Q  (0)  equal  0.5;  Case  B:  Now  U  =  (0,1,2),  and  the  state  of  the  system  is 
not  observed  during  production,  but  perfect  observation*  fi.e.,  the  state  is  ’completely 
observed’)  are  obtained  during  inspection,  so  that  Q  (0)  ■.>  as  in  Case  A  and  Q  (1)  is  the 
2-dimensional  identity  matrix.  We  recall  that  these  two  %  are  of  interest  since  the 
completely  unobserved  (CU)  and  the  completely  observed  (CO)  cases  repectively  provide 
upper  and  lower  bounds  for  the  optimal  value  of  the  cost  in  the  strictly  PO  case  (Ref. 
11). 

The  first  structural  results  associated  with  the  optimal  policy  and  cost  for  the  models 
described  above  were  given  by  Ross  (Ref.  10).  Among  several  results,  Ross  gave  neces¬ 
sary  and  sufficient  conditions  for  the  stationary  policy  "produce  for  all  values  of  p"  to  be 
optimal.  Ross  also  showed  that  (i)  every  optimal  policy  produces  for  all  pe[O,0];  and 
that  (ii)  it  is  optimal  to  replace  for  values  of  p  near  1.  Other  results  by  Ross  included 
suffic'fnt  conditions  to  verify  the  existence  of  optimal  policies.  The  conditions  were  stated 
in  terms  of  the  optimal  cost  and  were  thus  hard  to  verify.  The  characterization  of  the 
stationary  optimal  policies  was  done  by  White  (Ref.  3),  who  showed  that  among  the  sta¬ 
tionary  optimal  policies  there  is  a  smaller  class  of  optimal  policies,  called  structured  poli¬ 
cies,  such  that  one  need  only  look  for  structured  policies  when  solving  equation  (4). 
Wang’s  work  (Ref.  12)  was  aimed  at  showing  that  a  structured  policy  (called  ‘control- 
limit  policy’  when  only  two  actions  are  considered)  is  optima?  for  the  two  action,  CU  case. 
Wang  also  gave  analytic  expressions  for  computing  the  optimal  cost  and  the  optimal  poli¬ 
cy  for  this  problem.  Although  these  results  can  be  used  to  show  that  the  stationary  op¬ 
timal  cost  is  piecewise  linear,  Wang  did  not  do  so,  and  unfortunately  his  results  have 
been  referred  to  primarily  as  a  "computational  procedure"  (see  e.g.,  Ref.  13).  Wang  stu- 


died  a  more  general  model  for  the  two  action,  CU  case  than  the  one  being  treated  here, 
but  he  did  not  considered  the  case  of  three  actions  (closed  loop).  In  a  later  work  (Ref. 
14),  Wang  generalized  his  results  to  the  two  action,  CU  N  -  dimensional  (N  >  2)  case. 

Let  us  point  out  that  the  previous  results  characterize  the  optimal  policy  and  give 
some  properties  of  the  optimal  cost  function  for  the  problems  described  above,  but  they 
do  not  give  insight  on  what  happens  to  the  optimal  policy  or  to  the  optimal  cost  if  there  is 
uncertainty  in  the  knowledge  of  the  parameters  of  the  model,  or  if  these  parameters  un¬ 
dergo  (unexpected)  small  changes.  Solving  the  problem  again  via  Dynamic  Programming 
may  not  be  practical  for  the  infinite  horizon  case  in  terms  of  computational  effort,  as  will 
be  illustrated  in  the  examples  below. 

3.  PIECEWISE  LINEAR  OPTIMAL  COST 

Motivated  by  our  interest  in  determining  the  sensitivity  of  the  optimal  policy  with 
respect  to  the  parameters  of  the  model,  and  since  the  study  of  the  functional  equation  (4) 
provides  little  insight  on  how  the  optimal  policy  or  the  optimal  cost  change  when  the 
parameters  of  the  model  are  subject  to  small  changes,  we  focused  our  attention  on  the 
study  of  the  DP  (or  successive  approximations,  Ref.  9,  10)  algorithm  used  to  solve  equa¬ 
tion  (4).  Specifically,  we  analyzed  the  algorithm  given  by: 

V$  ( p)  =  min  [C  p  ,  /  ,  R  } 

Vg  (p)  =  min  (Cp  +  p  X  D  (k,p,  0)  Vg'1  ( T  (k,p,  0)) . 
t*c 

/  +  PXD(*,p,l)Vg-1(7'(*,p.l)), 

*=o 

R  +  p£z)(*,p,2)V$-1(r(*,p.2))}  (5) 

k*0 

where  n  represents  the  iteration,  and  VJS  (p)  is  the  minimal  cost  that  can  be  obtained 
starting  in  state  1  with  probability  p  and  proceeding  for  n  stages  with  costs  discounted 
by  a  factor  J3 .  Because  of  the  relative  simplicity  of  algorithm  (5),  and  since  from  the 
theory  of  contraction  mappings  it  is  guaranteed  that  algorithm  (5)  converges  uniformly  to 
the  unique  solution  V'p(p)  of  equation  (4)  as  n  -*  «>  (see  e.g.  Ref.  2),  we  were  able  to 
prove  the  piecewise  linearity  of  the  optimal  cost  function,  and  obtained  analytic  expres¬ 
sions  to  compute  the  optimal  cost  and  the  optimal  policy.  We  show  this  next. 

Consider  first  Case  A  described  above.  When  q  =  0.5  we  have 

D  (k  ,p,0)  =  D  (k  ,p,  2)  =  ‘A,  T  (k,p,  2)  =  0,  and  T{k,p,  0)  becomes  (Ref.  9): 

T(k,p,0)  =  p(  l-0)+0»  Tp  (6) 

Thus,  Tp  satisfies  Tp  >  p  for  p  e  [0,1),  with  unique  fixed  point  p  =  1 .  From  algo¬ 
rithm  (5),  denote  by  Pk  (p)  the  function  generated  by  Pk  (p)  *  Cp  +  $Pk-i(J  p), 
k  =  2,3,  •  •  •  ,  with  Pi(p)  =  Cp ,  and  denote  by  Rk  the  function  generated  using  Rk  * 

R  +  p/>*_1(0),  k  =  2, 3,  •  •  •  ,  with  R}  e  R  .  By  applying  recursively  (5)  one  obtains  that: 

PdP)  =  C  p  £  P‘  ( 1  -  0)'  +  C  £  p‘  ( 1-  ( 1  -  0)’ )  (7) 

i*0  i«  1 

Rk-  R  +  C  P‘  (  1  -  (  1  -  ® V  }  fS) 

i*l 

Since  C  <  R  by  hypothesis,  and  p  <  1 ,  the  first  iteration  of  (5)  gives  the  policy  "pro¬ 
duce  for  all  p  e  [0, 1]  ”.  As  k  increases,  if  Pk  (p)  *  Rk  for  all  *  e  N  ,  and  all  p  e  [0, 1] , 
one  obtains  Ross’  result  mentioned  above,  namely,  that  "produce  for  all  p  e  [0, 1  ] "  is  the 


stationary,  infinite  horizon  optimal  policy.  If  on  the  contrary  Pk  ( p )  *  Rk  for  some 
k  tN  and  for  some  p  <  1,  call  it  a* ,  then  the  optimal  cost  at  iteration  k  will  be 
specified  by: 

C  p  L  P‘  ( 1  -  By  +  c  z  p*  ( 1  -  ( 1  -  6)' ) 

i*0  1*1 

V$(p)  =  i 

*  +  c  z  p*  (  i  —  (  i  —  0)*  ) 

i*  1 

* 

with  optimal  policy  "produce  for  pe[0,at)  and  replace  for  p  e  [a*  ,1]".  The  point  we 
want  to  make  is  the  following:  for  n  =  k  +  1  (and  similarily  for  subsequent  iterations)  in 
order  to  compute  Vp+1  (p)  we  need  to  perform  a  minimization  in  an  interval  of  the  form 
[0  ,b) ,  0  <  b  <  1,  requiring  the  evaluation  of  V|  (T  p) ,  p  t  [0,b) .  But  since  T p  >  p 
for  p  <  1 ,  the  result  of  the  minimization  only  specifies  the  cost  function  in  an  interval  of 
the  form  [0,T~lb)  at  iteration  k+  1 .  This  implies  the  following: 

(i)  Since  p=  1  is  the  unique  fixed  point  of  T~x p  -  (p~0)/(  1-0),  and  T“]p  continues 
to  decrease  the  size  of  the  interval  of  the  form  [0,i> )  as  *-»«>,  there  is  an  iteration,  call 
it  /,  for  which  this  interval  is  smaller  than  [0,0)  (say  [0,y).Y  <  0)-  Since  T~]p<  0 
for  pe[0,Y),  the  cost  specified  in  [0,y)  in  iteration  /  will  not  enter  into  the  computa¬ 
tion  of  Vp+1  (p)  ,  Vj,+2(p). 

(ii)  To  specify  the  cost  function  on  the  remainder  of  the  interval,  namely  [r_1a*,l],  al¬ 
gorithm  (5)  requires  a  minimization  of  the  form: 

min  [C  p  +  $Rk  ,Rk+i )  (10) 

Note  that  in  (10)  we  are  comparing  a  function  associated  with  the  produce  action,  given 
by  an  affine  function  of  p  (referred  to  henceforth  simply  js  a  ’line  segment’jn  order  to 
facilitate  the  exposition),  and  Rk+ 1,  a  constant.  If  C  p  +  $Rk  is  smaller  than  /?t+I  in  (10) 
(for  some  value  of  p  e  [T~lak  ,1]),  a  new  line  segment  will  appear  in  the  description  of 
the  cost  function  (for  the  interval  p  t  [T~l  ak  ,a*+i))-  The  point  here  is  that  with  the  ex¬ 
ception  of  the  line  segment  shown  in  (9),  all  the  line  segments  appearing  in  the  descrip¬ 
tion  of  the  cost  function  come  from  a  minimization  of  the  form  (10). 

In  other  words,  from  (ii),  at  each  iteration  of  (5),  and  independently  of  the  number 
of  line  segments  already  describing  the  cost  function  in  that  iteration,  there  is  at  most 
one  new  line  segment  appearing  in  the  description  of  the  optimal  cost  function,  and  from 
(i),  for  k  large  enough  the  line  segments  are  also  leaving  the  problem  (meaning  they  no 
longer  appear  in  the  cost  function)  under  the  action  of  T~lp . 

(iii)  In  addition,  observe  that  the  line  segment  that  specifies  the  cost  function  at  iteration 
(say)  k  in  the  interval  [a,  b),  0<  a  <  b  <  1,  specifies  (with  formula  updated  by  the 
iterative  procedure)  the  cost  function  at  iteration  k  +  1  in  the  interval  [T~' a  ,T~lb) ,  and 
since  for  0  <  a  <  b  <  1  we  have  that  T~1b-T~la  =  (b-a)/(l-B)>  b  -  a  ,  the  line 
segments  leaving  the  problem  have  finite  nonzero  length. 

Finally,  if  we  call  a*  the  limit  as  k  -»«>  of  a*  (whenever  it  exists),  then  there  is  a 

finite  natural  number  ma.  such  that  T  °*a*  is  less  than  zero  (this  is  clear  because 
T~'p  <  p  for  0  <  p  <  1,  and  the  fact  that  p-  1  is  the  only  fixed  point  of  T~lp).  The 
same  is  true  for  each  of  the  o k ,  k  t  N  .  This  observation,  together  with  remarks  (i),  (ii) 
and  (iii)  above,  means  that  all  the  line  segments  that  appear  in  the  description  of  the  cost 
function  disappear  in  a  finite  number  of  iterations. 

We  introduce  the  following  notation.  Let  W\(p)m  V"!  0>)l(o,at ) .  that  is.  w${p) 
denotes  the  restriction  of  the  optimal  cost  function  at  iteration  k  to  the  interval  [0,<x* ) . 
Observe  that  in  this  notation  Rkj can  be  interpreted  as  the  restriction  of  v|  (p)  to  the  in¬ 
terval  [a*  ,1].  Let  Wp(p)  and  R  be  the  limits  (whenever  they  exist)  of  (p)  and  Rk 
respectively,  as  *-»«>.  Then  Wp(p)  (respectively  R  )  denotes  the  restriction  of  the 
infinite  horizon,  optimal  discounted  cost  function  Vp(p)  to  the  interval  [0,a* )  (respec- 


tively  [a*  ,1]). 

One  of  two  things  can  happen  so  that  convergence  is  achieved:  either  (a)  The  line 
segments  enter  into  the  problem  at  a  higher  rate  than  that  at  which  they  disappear  from 
the  problem,  and  hence  the  line  segments  accumulate,  meaning  that  in  the  limit  as 
k  ,  W  $(p)  will  not  be  described  by  a  finite  number  of  line  segments;  or  else  (b)  The 
rate  at  which  the  line  segments  enter  the  problem  is  the  same  as  that  at  which  they  leave 
the  problem,  and  hence  a  finite  number  of  line  segments  completely  describe  Wp  (P). 
We  have  the  following  Proposition. 


Proposition  3.1:  For  the  open  loop  model  described  above  in  Case  A  (two  actions, 
i.e.,  U  =  { product, replace ),  and  the  state  of  the  system  is  CU  during  production), 
Wp (p)^  associated  with  the  stationary,  infinite  horizon  optimal  policy  is  piecewise  linear. 
Since  R  is  constant,  this  means  that  the  infinite  horizon,  optimal  cost  function  Vp(p)  is 
piecewise  linear. 

Proof:  By  Lemma  3.2  in  Ref.  10,  we  have  that  every  optimal  policy  produces  for  all 
p  e  [0,0].  Hence  we  consider  the  following  two  cases: 

(i)  If  a’  =  0,  then  r_,a*  =  0,  and  the  claim  here  is  that  the  optimal  cost  Vp(p)  associ¬ 
ated  with  the  stationary,  infinite  horizon  optimal  policy  "produce  for  p  e  [0,0]  and  re¬ 
place  for  p  e  ( 0 , 1  ] ",  is  given  by: 


V*  (p) 


jCp  +  3/? 

=  l* 


pe[O,0] 
P  6(0, 1] 


(ID 


For  if  not,  assume  that  Wp  (p)  is  an  arbitrary  limit  of  piecewise  linear  functions.  Denote 
by  /(p)  the  optimal  cost  function  that  is  an  arbitrary  limit  of  piecewise  linear  functions 
for  pe[0,a*],  and  a  constant  for  pe(a*,l].  Then  from  (4)  we  have  that: 
f(p )  =  min  [Cp+p/(Tp)  ,  +  p/( 0) }.  That  is,  by  Lemma  3.2  in  Ref.  10,  and  for 

OS  p  <  a*  =  0,  we  have  /(p)  =  Cp  +  fif(Tp).  Since  /(7p)  =  R  +  B/(0)  is  con¬ 
stant  (say  K  )  for  0  <  p  S  a*  =  0,  we  have  /(p)=Cp  +  AT  for  p  e  [0,a  ].  Therefore, 
if  a’  =  0,  the  infinite  horizon,  optimal  cost  is  given  by  (11),  which  means  that  there  is 
only  one  line  segment  describing  W'j(p). 

(ii)  Since  the  case  for  which  a’  =  1  was  considered  by  Ross  in  Ref.  10,  assume  that 
0  <  a*  <  1 ,  and  that  convergence  takes  place  as  described  in  (a)  above.  As  explained  in 
remark  (ii)  above,  the  line  segment  that  appeared  in  the  problem  most  recently  specifies 
the  cost  function  in  the  interval  [r-1a*,a*).  Since  a*  >  r_1  a*,  this  line  segment  has 
finite  nonzero  length.  Now  assume  that  for  p  e  [0,r-1a*)  the  cost  function  is  arbitrary. 
Denote  by  /(p)  the  optimal  cost  function  that  is  an  arbitrary  limit  of  piecewise  linear 
functions  for  p£[0J'1a'),  an  affine  function  of  p  for  p  t[T~l  a  ,a  ) ,  and  constant 
for  p  e  [a* ,  1  ] .  From  (4),  /(p )  is  given  by  /(p )  =  min  [Cp+P/(rp),R+P/(0)}. 
Since  for  p  e  [ 7  1  ( T~la  )  ,  7’"1a* )  the  optimal  policy  is  to  produce, 
/(p)  =  Cp  +  p/(Tp)  for  p  e  [  3  1  (  n 1  d*  )  ,  1  ‘a*).  But  for 

p  e  [  7"1  ( r1  a* )  ,  r'a* )  we  have  fp  e  [r’a*  ,  o‘),  and  so  f(Tp)  is  the  above 
mentioned  segment  in  [r-1a*  ,  a*)  with  finite  (nonzero)  length,  which  in  turn  means 
that  /(p)  =  Cp  +  p/(  r  p )  is  also  an  affine  function  of  p,  and  since 


r'o'  >  a  ) ,  it  also  has  a  finite  (nonzero)  length.  By  remark  (iii)  above,  we 

have  that  a*-7"_Ia*>  r'o'-r^r'a).  Therefore,  continuing  the  procedure  just 

described,  we  can  see  that  there  is  a  uniform  lower  bound  (different  from  zero)  for  the 
length  of  all  the  line  segments  which  is  independent  of  the  iteration  because  T"‘p  is  in¬ 
dependent  of  the  iteration.  Taking  into  account  the  previous  observations,  we  conclude 
that  this  uniform  lower  bound  for  the  length  of  the  line  segments  implies  an  upper 
bound  in  the  number  of  line  segments  describing  V^(p).  QED 
At  this  point  the  following  remarks  are  in  order: 

(i)  To  the  best  of  our  knowledge,  the  piecewise  linearity  of  the  optimal  cost  function  in 
the  infinite  horizon  model  for  the  cases  described  above  has  not  been  reported  previously 
(see  for  example  all  the  references  cited  so  far,  and  recent  reviews  like  Ref.  16).  Its  use¬ 
fulness  will  be  apparent  in  the  sequel. 


(ii)  Although  the  two  action,  CU  model  may  be  of  limited  interest  in  practice  (namely,  it 
may  only  be  used  in  some  replacement  models  in  which  the  equipment  is  subject  to 
breakdowns  but  no  measurements  are  available,  and  therefore  only  open  loop  control  is 
applicable),  its  importance  here  resides  in  that  the  insight  obtained  by  its  study  allowed  us 
to  perfoim  a  similar  analysis  (and  also  prove  the  piecewise  linearity  of  the  optimal  cost 
function)  for  the  three  action,  closed  loop  case. 

(iii)  The  analysis  of  the  successive  approximations  algorithm  for  the  case  of  three  actions 

is  not  trivial.  As  pointed  out  in  Ref.  15,  p.  29,  a  policy  can  appear  at  any  time  during  the 
iterative  procedure,  yet  fail  to  be  optimal  for  the  infinite  horizon  case.  Furthermore,  it 
might  also  happen  that  a  policy  appears  during  some  iteration,  does  not  appear  in  the 
(e.g.,  one  hundred)  subsequent  iterations,  and  reappears  later  (or  never  reappears), 
meaning  that  a  policy  structure  which  is  not  any  longer  optimal  after  some  finite  iteration 
k  e  N  ,  cannot  be  eliminated  as  suboptimal,  and  so,  estimation  of  the  minimum  number 
of  iterations  required  (for  example  in  algorithm  (5))  to  guarantee  that  an  optimal  policy 
for  the  (finite)  horizon  is  also  the  optimal  policy  for  the  infinite  horizon  case, 

remains  an  open  problem  (Ref.  15). 

For  the  sake  of  brevity,  we  will  not  go  into  the  details  of  the  three  action,  closed  loop 
problem,  since  it  requires  a  lenghty  analysis  of  the  successive  approximations  algorithm 
(5).  However,  we  have  studied  the  algorithm  (5)  and  shown  that  some  policy  structures 
cannot  occur  at  all  during  the  iterative  procedure  (e.g.,  there  is  not  a  "produce-inspect” 
policy),  and  that  the  occurrence  of  others  do  not  affect  the  stationary,  infinite  horizon 
policy  structure  (Ref.  17). 

With  these  results  we  were  able  to  show  that  the  infinite  horizon,  optimal  cost  func¬ 
tion  in  the  three  action,  closed  loop  problem  is  piecewise  linear,  and  were  able  to  develop 
analytic  expressions  (equivalent  to  those  of  Wang  in  Ref.  12  for  the  two  action,  CU  case, 
and  new  ones  for  the  three  action,  closed  loop  case)  for  the  costs  and  the  structured  op¬ 
timal  policies  for  Cases  A  and  B  (namely,  policies  that  as  a  function  of  p  have  the  char¬ 
acterization:  "there  exist  three  numbers  p, ,  i  =  1,2,3,  0  <  p,  <  p2  <  p3  <  1,  such  that 
it  is  optimal  to  produce  for  0  £  p  <  P[  (0  <  p  <  pj  if  p,  =  0)  and  p2  <,  p  <  p3,  it  is 
optimal  to  inspect  for  pi  <  p  <  p2,  and  it  is  optimal  to  replace  for  p3  5  p  <  1”;  see 
Ref.  3,  10  for  detailed  analysis).  This  in  turn  allows  us  to  avoid  the  computational  bur¬ 
den  described  in  Section  2  when  solving  the  control  problem:  in  particular,  the  results  al¬ 
low  us  to  perform  a  sensitivity  analysis  of  the  optimal  policy  with  respect  to  any  of  the 
parameters  of  the  problem,  as  will  be  illustrated  in  the  examples  ahead. 

Once  it  is  established  that  only  a  finite  number  of  line  segments  is  required  to  com¬ 
pletely  describe  the  infinite  horizon  optimal  cost  function,  algorithm  (5)  can  be  used  to 
find  analytic  expressions  to  compute  the  cost  and  the  policy  structure.  Since  the  impor¬ 
tance  of  these  formulas  reside  in  their  use  to  perform  sensitivity  analyses  of  the  optimal 
cost  and  policy,  we  illustrate  this  next  with  some  examples. 

4,  EXAMPLES 

Example  1:  Consider  the  closed  loop  problem  with  the  following  data:  P=  0.985,  0=  0.1 , 
C  -  4.0 ,  /  =  5.56  and  R  =  10.0.  The  stationary,  infinite  horizon  optimal  policy  is:  "pro¬ 
duce  for  p  e[0, 0.5145193)  and  p  e  [0.5369462,0.6070793) ,  inspect  for 
p  e  [0.5145193 ,0.5369462)  and  replace  for  p  e  [0.6070793 , 1  ] ",  and  the  associated  optimal 
cost  is  given  by: 


r 


Vp(p)  =  1 


23.33618p  + 
21.81182/)  + 
20.09230/)  + 
18.15262  p  + 
15.96460 p  + 
13.49645  p  + 
10.71230 p  + 

7.57168/)  + 


151.88782 

152.01965 

152.32544 

152.82985 

153.56075 

154.54959 

155.83191 

157.44782 


p  t  1 0.0000000 , 0.0864824 ) 
pe  [0.0864824,0.1778342) 
pe[0.  1778342, 0.2600507) 
p e [  0.2600507 , 0.3340457 ) 
pt  [0.3340457,0.4006411) 
pt  f 0.40064 11,0.4605770) 
pt  [0.4605770,0.5145193) 

pt  [0.5145193,0.5369462) 


7.54600 P+  157.46161  p  t  [0.5369462, 0.5634215 ) 
4 .00000 p  +  1 59 .4  5950  pt[  0.56342 1 5 , 0.6070793 ) 


161.88782 


p  t  [  0.6070793 , 1 .0000000  ] 


(12) 


Observe  that  equation  (12)  is  a  closed  form  formula  for  the  infinite  horizon  discounted 
cost.  Also,  note  that  there  are  7  line  segments  describing  the  optimal  cost  function  for 
P  t  [0.0,0.5145193) ,  and  2  line  segments  describing  the  optimal  cost  function  for 
p  e  [0.5369462,0.6070793) .  Now  change  8  from  0.1  to  0.10005.  The  optimal  policy 
structure  now  changes  from  4  to  2  regions.  In  this  case  a*  =  0.60721,  and  the  optimal 
cost  is  given  by: 


Vp(/>) 


23.3210  p  +  151.9233 
21.7959  p  +  152.0562 
20.0754  p  +  152.3632 
18.1346p  +  152.8691 
15.9452p  +  153.6017 
•  13.4753p  +  154.5927 
10. 6890  p  +  155.8774 
7.5458p  +  157.4962 
4 .0000 p  +  159.4945 


pt  [0.00000,0.08713) 
pE(0.08713, 0.17846) 
pt  [0.17846,0.26065) 
pt  [0.26065,0.33463) 
pt  [0.33463.0.401 20) 
pE  [0.40120,0.46111) 
pt  [0.461 11, 0.5 1502) 
p  E[0.51502,0.56355) 
pe  [0.56355,0.60721) 


(13) 


161.9233 


pt  [0.6072 1,1. 00000] 


Note  how  a  relatively  small  change  in  the  value  of  0  resulted  in  a  significant  change  in 
the  optimal  policy  structure.  To  the  best  of  our  knowledge,  changes  in  the  optimal  policy 
due  to  such  small  changes  on  the  parameters  of  the  model  could  not  be  studied  before, 
because  the  necessity  of  discretization  often  does  not  permit  high  confidence  in  the 
results  obtained  by  following  the  DP  algorithm.  We  are  able  to  study  small  changes  in  the 
parameters  of  the  model  because  of  the  analytic  expressions  found  as  a  consequence  of 
the  piecewise  linearity  of  the  optimal  cost. 

Now  consider  the  case  when  8  changes  to  8=0.09.  The  optimal  policy  is:  produce 
for  pe  [0,0.4874221)  and  p  e  [0.5772912, 0.5773791 ) ,  inspect  for 
p  e  [0.4874221 .0.5772912) ,  replace  for  p  e  [0.5773791 , 1  ] ,  and  the  optimal  cost  is  given  by: 


25.71507/?  +  143.96777  p  e  [  0.0000000 , 0.0080949 ) 

24.2261 1  p  +  143.97982  p  e  [  0.0080949 , 0.0973664 ) 

22.56497/?  +  144.14156  p  e  [0.0973664 ,0.1786034 ) 

20.71174/?+  144.47256  p  E  [0.1786034,0.2525291 ) 

18.64421/?+  144.99467  pt{  0.2525291 , 0.3198015  ) 

16.33760/?+  145.73232  />  e  [0.3198015 ,0.3810193 ) 

13.76427/?+  146.71281  /?  e  [  0.3810193 , 0.4367276) 

10.89336p  +  147.96661  p  e  [0.4367276 , 0.4874221 )  (14) 

7.69048 p  +  149.52777  p  t  [  0.4874221 , 0.5772912) 

4.00000 p  +  151.65825  p  e  [0.5772912,0.5773791 ) 

153.96777  pt  [0.5773791 , 1.0000000] 

The  optimal  policy  for  this  example  still  has  four  regions  if  8  changes  to  any  value  in 
[0.09,0.10) .  However,  note  that  the  number  of  line  segments  describing  the  optimal  cost 
function  changes  for  different  values  of  0.  Also,  note  the  size  of  the  interval  for  which 
the  line  segment  with  formula  4.0/?  +  151.65825  is  specified  in  (23).  We  compared  these 
results  with  those  given  by  the  successive  approximations  algorithm.  The  results  were 
very  difficult  to  obtain  by  using  the  successive  approximations  algorithm.  Unless  one 
knows  in  advance  the  structure  of  the  stationary,  infinite  horizon  optimal  policy,  it  is  very 
difficult  to  decide  when  the  optimal  policy  has  been  reached  (and  hence  to  decide  when  to 
stop  the  computational  procedure),  even  after  several  choices  of  the  grid  have  been  test¬ 
ed  (with  the  corresponding  time  consumption  involved). 

It  is  clear  that  the  same  kind  of  analysis  carried  out  here  can  be  done  for  any  of  the 
other  parameters  of  the  problem  (i.e.,  (i,  R  ,  /  and  C  ).  Thus  the  equations  found  in 
this  work  can  be  used  to  obtain  insight  in  the  way  the  system  responds  to  uncertainties, 
and  therefore  adaptive  policies  can  be  designed  (for  example,  to  modify  the  value  of 
some  of  the  parameters  to  compensate  for  an  undesired  change  in  some  other  parame¬ 
ters)  so  that  the  system  continues  to  perform  in  a  preselected  satisfactory  way. 

Example  2:  Now  consider  the  open  loop  case.  Let  p=  0.9999,  0=0.1,  C  =  4.0  and 
R  =  10.0.  Using  algorithm  (5)  with  a  grid  of  1001  points  in  the  interval^  [0,1],  one  ob¬ 
tains:  for  n  =  1000,  a*  =  0.609  ,  R  =  2272.10;  for  n  =  10000,  a*  =  0.588  ,  R  =  15077.17;  and 
for  n  =  15000,  a*  =  0.601  ,  R  =  18528.28 .  We  note  that  the  last  case  mentioned  above  took 
52  minutes  of  CPU  (as  compared  to  less  than  a  second  when  using  the  expressions 
derived  here,  for  the  same  computer  and  computer  load),  and  although  the  values  ob¬ 
tained  may  suffice  when  solving_the  (initial)  control  problem  (actual  values,  obtained  with 
the  analytical  expressions,  are  R  =  23882.62  and  a*  =  0.597167)  it  is  apparent  that  a  sensi¬ 
tivity  analysis  would  be  not  only  expensive  in  terms  of  computer  time  (this  is  so  for  any 
computational  algorithm  since  p  takes  uncountably  many  values,  and  we  are  dealing  with 
the  infinite  horizon  problem),  but  also  hard  to  perform  in  the  sense  of  detecting  the  actu¬ 
al  effect  of  the  uncertainties  on  the  optimal  cost  and  policy. 

5.  CONCLUSIONS 

The  analysis  of  the  successive  approximations  algorithm  used  to  solve  the  functional 
equation  satisfied  by  the  optimal  cost  associated  with  the  problems  described  in  Section  3, 
allowed  us  to  prove  the  piecewise  linearity  of  the  optimal  cost  function,  and  to  develop 
analytical  expressions  to  compute  both  the  infinite  horizon  optimal  cost  and  the  station¬ 
ary,  infinite  horizon  optimal  policy  structure.  These  results,  in  turn,  permit  the  perfor- 


mance  of  a  sensitivity  analysis  for  the  optimal  cost  and  policy  with  respect  to  any  of  the 
parameters  of  the  problem. 

The  examples  in  Section  4  suggest  that  for  the  study  of  changes  in  the  optimal  policy 
due  to  small  changes  in  the  parameters  of  the  model,  better  results  can  be  obtained  if 
structural  properties  of  the  policies  and  the  cost  are  taken  into  account  in  the  design  of 
computational  procedures.  Since  the  strictly  PO  case  is  difficult  to  treat  analytically,  the 
study  of  structural  properties  of  the  optimal  cost  and  the  optimal  policy  for  special  cases 
of  the  strictly  PO  case,  like  those  considered  here,  is  justified  as  a  way  to  approach  the 
strictly  PO  problem . 

The  development  of  similar  results  for  more  complex  problems  like  the  study  of  the 
strictly  PO  case,  higher  dimensional  models,  and  the  average  cost  case,  is  currently  being 
investigated. 
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